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(54) A method and apparatus for interactive language instruction 



(57) A method and apparatus for interactive lan- 
guage instruction is provided that displays text files for 
processing, provide key features and functions for inter- 
active learning, displays facial aninnation, and provides 
a workspace for language building functions. The sys- 
tem includes a stored set of language rules as part of 
the text-to-speech sub-system, as well as another 
stored set of rules as applied to the process of learning 
a language. The method implemented by the system 
includes digitally converting text to audible speech, pro- 



viding the audible speech to a user or student (with the 
aid of an animated image in selected circumstances), 
prompting the student to replicate the audible speech, 
comparing the student's replication with the audible 
speech provided by the system, and providing feedback 
and reinforcement to the student by, for example, selec- 
tively recording or playing back the audible speech and 
the student's replication. 
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Description v : • 
Background Of the Invention ^ 

[0001] This inv€iniibn' relates to' ia method and appa- 5 
ratus for interactive language instruction: More particu- 
larly, the invention is directed to a multi-media and multi- 
modal conriputer application that displays- text files for 
proceissing, provides features and functions forJnterac- - 
tiviB learning, displays facia! animation,'and provides- a . io- 
workspace for language building functions. The system 
includes a stored set of language rules as part of the. 
text-to-speech sub-system, as well as another stored 
set of rules as applied to the process of learning a lan- 
guage. The method implemented by cthe system- is 
includes digitally converting text to audible speech,, pro- 
viding the audible speech to a user or student (with the 
aid of an animated image in selected circumstances), 
prompting the student to replicate the'audible speech, 
comparing the student's replication with', the audible 20 
speech provided by the system, conducting perform- 
ance analysis on the speech (utterance) and providing 
feedback and reinforcement to the student by, for exam- 
ple, selectively recording or playing back the audible 
speech and the student's replication. ; • - ^ 25 
[0002] While the invention is particularly directed to 
the art of interactive language instruction, and will.be 
thus described with specific refei-ehce thereto, it will be 
appreciated that the invention may have usefulness in 
other fields and applications. For example, the invention 30 
may be used to teach general speech skills to individu- 
als with speech challenges or may be used to train sing- 
ers to enhance vocal skills. 

[0003] By way of background, interactive language 
instruction programs are known. For example, U.S. Pat- 35 
ent No. 5,634,086 to Rtischev et al. is directed to a spo- 
ken language instruction method and apparatus 
employing context based speech recognition for instruc- 
tion and evaluation. However, such known language 
instruction systems require the use of recorded speech 40 
as a model with which to compare a student's attempts 
to speak a language sought to be learned. 
[0004] Work involved with preparing the lesson as 
recorded speech (such as preparing 'a script) includes 
recording phrases, words, etc:, creating illustrations, 45 
photographs, video, or other: media, and linking the 
sound files with the images and with the content of the 
lessons or providing large databases of alternative 
replies in dialogue systems which are designed to repli- 
cate interactions v^^ith students for context based les- so 
sons, etc. 

[0005] Moreover, language students may be inter- 
ested in learning words, phrases, and context of a par- 
ticular interest such as industry specific terms from their 
workplace (computer industry, communications, auto 55 
repair, etc.). Producing such special content is d'rfficult 
using recorded speech for the language lesson. 
[0006] Other difficulties with using recorded speech 



in this, context are numerous. The quality of the record- 
ing medium may present problems. In this regard, an 
excessive amount of background noise in the recording 
may affect the quality thereof. In addition,, recorded 
speech is subject to many other factors that, may unde- 
sirably enter the speech nriodel., For example, recorded 
speech may include speaker accents resulting. from the 
speaker being a native of a particular geographic area. . 
Likewise, recorded speech may reflect a particular emo- 
tional state of the speaker such as^ whether speaker is 
tired or; upset. As a result, in any- of these circum- 
stances, as well as others,- the shortcomings of 
recorded .speech make it moi;e difficult for a student to 
learn, a latiguage lesson. , _ 

[0007] A few products exist which allow users to 
process files of text to be read aloud by synthesized or 
recorded speech technologies. These products are 
commonly known as text-to-speech engines. See, for 
example, U.S. Patent No. 5,751,907 to Moebius et al. 
(issued May 12, 1998) and U.S. Patent No. 5,790,978 to 
Olive et al. (issued August 4, 1998), both of which are 
incorporated herein by reference. Some existing prod- 
ucts also allow users to add words to a dictionary, make 
modifications to word pronunciations in the dictionary, 
or modify the sound created by a text-to-speech engine. 
See, for example,: EP application no: 00303371.9. 
[0008] Voice or , speech :.recognition systems are 
also known. These systems use a variety of techniques , 
for recognizing speech patterns including utterance ver- 
ification or verbal information verification (VIV), for 
which a variety of patents owned by Lucent Technolo- 
gies have been applied for an/or issued. Among these 
commonly assigned patents/applications are U.S. Pat- 
ent No. 5,797,123 to Chou et al. (filed December 20, 
1996; issued August 18, 1998); EP-A-892 387; EP-A- 
892 388; and U.S. Patent No. 5,649,057 to Lee et al. 
(filed January 1 6, 1 996; issued July 1 5. 1 997). 
[0009] It would be desirable to have available an 
interactive language instruction program that did not 
rely exclusively on recorded speech and utilized reliable 
speech recognition technology, such as that which 
incorporates utterance verification or verbal infonnation 
verification (VIV). It would also be desirable to evaluate 
a speaker's utterance with predictive models in the 
absence of a known model. The system would provide 
a confidence measure against any acoustic model from 
which a score can be derived. It would also be desirable 
to have available such a system that selectively incorpo- 
rates facial animation to assist a student in the learning 
process. 

[0010] The present invention contemplates a new 
and improved interactive language instructor which 
resolves the above-referenced difficulties and others. 

Summary Of The Invention 

[001 1 ] A method and apparatus for voice interactive 
language instruction is provided. 
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[0012] In one aspect of the -inrv!fentiGn,->a system 
comprises a first module configUteet^tG! Gfigitally convert 
input text to audible speech in a sfelected^' language,- a 
user interface positioned tOTeceive utterances spoken 
by a user in attempting to replicate 'thig audible speech^ 5 
and a second mbdufe configured to^ re'cognize the utter- 
ances and provide feedback to' the^^ to: an iaccts^;;- 
racy at ' which the' iTser * replicates^ the speech .in ^the-r 
select:ed-langi/ag6- based' on a comparison di'the?:utter--: 
ances tb'the audible'spe^ch,' any'-aobustic rnodel^, pre-; w > 
dictive rnodefsr phoneme 'models; di phone - mbdelsp or. 
dynanriicafly gerrenated modeis.'- ^ ■ ^- . " ' 
[0013]"'^ Iri'a more liniited iaspefct' of the ihvenfibhra" 
third module is provided which is' synch ronizecl to the 
first nriodule ahd which provides an animated image 6f 'a~i 15 
human face and head pronouncing the audible speech'.*- 
[001 4] =^ In' another aspect of the' invention, the^ani-^^ 
mated image* of the face and h uman head 'portrays :a' 
transparenVf ace and head. ' ^- q-.-; 

[0015] In another aspect of the invention;^thfe ani- 20 
mated image of the faice and human head portrays a 
three dimensional perspective and the image can be 
rotated/tilted, etc. for full view from various angles. 
[0016] In another aspect of the invention, -the first 
and third modules^urther include controls to control one; 25 
of volume, speed/ and vbcatdHa'r^cWisticB Of 'the audi- 
ble speech-and the ahimated irnage.^ ' '~ • n;' 
[0017] -"'In another aspect of thfe invention, the model 
is one of a predictive model, phoneme model,a diphone 
model, and a dynarhically generated model. - ' 30. 
[0018] In another aspect of the inventionv the first 
module-includes files storing model pronunciations for 
the words or sub-words conhp rising the input text: 
[00^9] In another aspect of the inventionj the sys- 
tem comprises lesson files upon vvhich the input text is 35 
based. . ^-'i-: ■ ' • 

[0020] In another aspect of the invention, the input 
text is based on data received from a source outside of 
the system. ' • ■• *^ ■ - ■ " : ' 

[0021] In another aspect of the invention^the sys- 40 
tem further includes dictionary files. 
[0022] In another aspect of the; invention, the sys- 
tem further comprises a record and playback module: 
[0023] In another aspect of the invention; the ;sys- 
tem includes a table storing ' mapping- infomnation 45 
between word subgroups and vocabulary words.^ = 
[0024] In another aspect of the invention, the sys- 
tem includes a table for storing mapping information 
between words and vocabulary words. 
[0025] In another aspect of the invention, the sys- so 
tem includes a table for storing mapping infomnation 
between words and examples of parts of speech. 
[0026] In another aspect of the invention, the sys- 
tem includes a table of punctuation. 

[0027] In another aspect of the invention, the sys- 55 
tem includes a table of sub-words and corresponding 
sub-words in another language. For word sound drill, for 
example, when learning a first language (given a stu- 



dent who natively speaks a second language), sub- 
words from the first language may be mapped to sub- 
words in the second language, to JJlustrate sound alike 
comparison to the student. The sub-word table will also 
be;.,used;toJCK:ate and dispiay/piay . vocabulary words 
uslng^the sub-word fr^ 

[0028] i ' flmanotheriaspect of the inyentiop, ^method 
isrprovidedtthat'indudes .converting. input. te)^ .data to 
audibte'speecin 'data; generatiirg-caudit^e. speech, coni: 
pr+sing ^ p hon ernes, .or. = dip h 0 nes basedroaj.th e^. audible 
speech dataV generating ^AO animated image; of a. face 
and head -pronouocing the.taudibte speech,. synchroniz-, 
ing^^^the audible speech and the = animated =..ifTiage, 
prompting the oiser to -.-attempt to replicate the. audible 
speech - recognizingi utterances generated by the user 
in responsecto thei prompt, corpparing the phonemes or 
diphorves to the.: utterances, and^providing feedback to 
theuser based^on the-comparison. 
[0029] 1 : A D another aspect of the invention, a series of 
sentences isi provided which represent the basjc inven- 
tory of phonemes^and diphones in a language. The stu- 
dent will read ^hesertt^nces and they jWill be rpcorded. 
The sub-words'wlil'be analyzed.to determine baseline 
score or starting perforniance of the student. This may 
be used to determine pnpgrese,- to .establish a level for 
exercises,! or^to^idenlift^, areas ta. yyprkron. 
[0030] In another aspect Qf; the Invention, a table of 
reference scores is provided Mforf.grade levels in lan- 
guage classes given populations of students. The stu- 
dent :progress can be measured and, graded on an 
individual basis or as: compared, with the population of 
choice. ' ; . 

[0031] In another aspect of the invention, a score 
for student's speech will be provided in sub-words, 
words, sentences, or paragraphs. Student can receive 
an. overall score, or a score pn individual parts of the 
speech; 

[0032] In another aspect of the invention, normali- 
zation, issues regarding verification of speech are man- 
agedithrough the interface. Given speech of differing 
duration, and ,con:ip,|e>tity,;ithe. aniijiated cursor on the 
screen can be set by^the system -or by the student. 
When the student r^eads^filong With the animated cursor, 
the verification -process. .^;an correlate the text which is 
highlighted withithe^sounci fiie;to be analyzed, 
[0033]'. - In. another aspect-. of the invention, certain 
recorded sounds ican; be interjected for emphasis of nat- 
ural sound for known-sub-wprds or words of a given lan- 
guage. Theseiwords may be taken from a previously 
recorded dictionary, application, or other resource. „ 
[0034] In another aspect of the invention, baseline 
scores are recorded in a .table. The table is used tp 
determine appropriate level of lesson to be selected for 
the student. With this table, the system can automati- 
cally use the same text, content, etc. for students of dif- 
ferent abilities by modifying thresholds of confidence 
measurement. 

[0035] In another aspect of the invention, the 
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teacher or student can, use the graphical user interface . 
to establish or fnodify thresholds for the- confidence:. . 
measurennent, grade levsl, or other attributes. ; - 
[0036] In another aspect of the invention, the stu- 
dent registers identification, baseline score- and subse^- -- 5 
quent lesson scores to achieve cUstornize'd lessons and :. 
to track progresis. • . : ■ ■ ■ i ; 

[0037] Further scope of the- applicability, of. the. 
present invention will ' become apparent " from the . .. - 
detailed description provided beiovv:1t^should be under- - . 70 
stood, however, that the detailed description and spe- . 
cific examples, while indicating preferred embodiments . 
of the invention, are given by way of illustration only, 
since various changes and m'odificatidns; within the, 
spirit and scope of the invention wiil become apparent to . 15 
those skilled in the aii. ■ " - " 

Description Of The Drawings ' rv- 

[0038] The present invention exists- in't he construe- . 20 
tion, atrangement, and combination of the various parts 
of the device, and steps of the -method, whereby the 
objects contemplated are attained as hereinafter more 
fully set forth, specifically pointed out in the cfaims.rand 
illustrated in the accompanying dravvings- in which: .25 

Figure 1 is a -schematic illustration of a system 
according to the present invention; 

Figure 2 is an illustration of a v/indow generated to 30 
facilitate interactive learning according to the 
present invention; 

Rgure 3 is a flowchart of the overall method accord- 
ing to the present invention; 35 

Figure 4 is a detailed flowchart of a text selection 
and audible speech generation method according 
to the present invention; 

Figure 5 is a detailed flowchart of a text selection, 
animation and audible speech generation method 
according to the present-invention;. 

Figure 6 is a detailed flowchart , of a recording -45 
method according to the present invention; 

Figure 7 is a detailed flowchart of another recording 
method according to the present invention; 

50 

Figure 8 is a detailed flowchart of a playback 
method according to the present invention; 

Rgure 9 is a flowchart illustrating a student registra- 
tion method according to the present invention; 55 

Rgure 10 is a flowchart showing a grade level eval- 
uation (speech portion) according to the present 



invention; aQd^.. 

Rgure 11 is a flowchart showing a scoring, example 
according to the present invention. ' . 

Detailod Dsscnption Of The Prsferred Embodi- 
ments . . . , . 

[0033] Referring now,-to the drawings vvherein the 
showings are for purposes of illustrating the prefeirred, 
embodiments of the invention only and not for purposes 
of limiting same, Figure 1. provides a view of the overall 
preferred system according.to.the present invention. As 
shown; an interactive language instruction system 10 is 
provided. The system 10 includes a computerized 
apparatus or system 12 having a microcontroller or 
microprocessor 14. The system. 10 further has one or 
moreJnput devices 16 such as a keyboard, mouse, etc., 
a microphone 18, an input link 20, one or more display 
devices 22, an audio speaker 24 and an output file inter- 
face unit 26. All such components are conventional and 
known to those of skill in the art and need not be further 
described here. Moreover it should be appreciated that 
the system 10 in suitable form may also be incorporated 
in and/or compatible with client-server and slim client 
architectures. It is to be. further appreciated that the sys- 
tem could be provided and deliverable through compact 
disks, the Internet, or downloadable to a smaller or 
more mobile device. 

[0040] The system 12 includes a variety of compo- 
nents which may be incorporated therein as shown or 
may be remotely located from computer 12 and acces- 
sible over a network or other connection in accordance 
with the present invention. As shown, the system 10 
includes a text-to-speech module, or TTS module, 30 
and an automated speech recognition module, or ASR 
module, 32. These modules are conventional and 
known to those of skill in the art. Preferably, the TTS 
module 30 incorporates teachings of,, for example, U.S. 
Patent No, 5,751 ,907 to Moebius et al. (issued May 12, 
1998) and U.S. Patent No. 5,790,978 to Olive et al. 
(issued August 1 1 , 1998), and the ASR module (includ- 
ing the verbal information verification portion 32a) incor-. 
porates, for example, the teachings of U.S. Patent No. 
5,797,1.23 to Chou et al. (filed December 20, 1996; 
issued :August 18, 1998); EP-A-892 387 A1; ,EP-A-892 
388;, and, U.S. Patent No. 5,649,057 to Lee et.al. 
(issued July 15 1997). 

[0041] The TTS module 30 converts text stored as 
digital data to audio signals for output by the speakers 
24 in the form of phonemes and the ASR module 32 
converts audio signals received through microphone 18 
into digital data. Also provided to the system is an ani- 
mation module 34. 

[0042] The TTS module 30 has associated thenwith 
a rules module 36 for facilitating the conversion of the 
text to audible speech. More specifically, the rules mod- 
ule 36 has stored therein code that allows multilevel 
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analysis of the words for which converSioh to audible 
speech Is sought The rules module sequentially ana- 
lyzes a selected word, analyzes the-W^rtT^irrthe context 
of the sentence (e.g. analyzes the sUrroohding words or 
the part of speech (e.g. detemnines whether •'address" 
is a noun' or a verb))','jiahdUhen analyses ^he^nfejiGeis 
fornnat (e.g. determines whether the sentence is a queSr^V' 
tion or a statement). This analysis, schenne facilitates a 
m 0 re acdli rate' pf d nti nfcfati on -of eaidK S/^d rd (e: g . p roper i)' 
emphasis) in tHe t'blntexf^ih which tha'AA/ord^lS used.jThe./ ■ 
TTS module 30 fs' also TnVcomrnuriicat ion with- ai dietionT: ' 
ary fitd or rBCordeet dictionary 38 to fecilitate properprot 
nunciatioh of selected Words- and.^of courseT a- lesson 
file 40 frbni which text-for lessons is retrieved. It "rs^lo be 
appreciated the lesson ^text may also^be' bbtairvede 
through input link 20 frdm:V^rious other sources Inctud-jr 
ing the Internet, LANs, WANs, scan ners,-^ closed caption 
devices, etc. This feature allows ttie content of the. tes^:-; 
sons to be separated from the functions of the systenrt.r; 
That is, the system and method of the present iri\/ention :^ 
can be applied to different iesson content to3suit the 
needs and/or desires of the user or student. 
[0043] The prefen-ed TTS module or engine ' 
includes therein model prbnunciations of all words iarKl 
sub-words (entered in text. ThesW'mo'efel^ files' -are" titti^ • 
mateiy used to compare AvIth thVv^^^^ 
student, as will be described in gfeaterde^^^^ Any * 

word in a dictionary or file can be used vvith the system 
of the present invention. In other language learning 
products, lessons are limited to the recorded words and " 
phrases. The prefen-ed TTS module provides tTie eapa-^ - 
bility to recognize text or a text file and processMt, 'cori-^ 
verting it to audible speech. The preferred addition of a 
TTS module provides flexibility for lesson production in 
that it repurposes other materials, current events, news 
stories, web content, files, specialized documents, etc. 
and provides the ability to apply the application to spe- 
cial needs situations such as speech therapy where 
customized word pi^ctice is desirable. 
[0044] For a language student, the preferred TTS 
module provides examples of text pronounced accord- 
ing to the rules of speech in that language. The vocal 
quality of the TTS module is preferably extremely high. 
The student can thus listen to the pronunciations of any 
word in the dictionary or file even when a native- English 
speaker is not available, and without requiring that the 
words used in lessdns be previously recorded and 
stored in the system. Inflections and tonal variations 
common to language in context are included in the sys- 
tem which would be difficult to do with recorded speech. 
The TTS module also accommodates regional accents 
through the addition of specific pronunciation files which 
may be used in a specific context to demonstrate pro- 
nunciation alternatives including but not limited to: 
American, regional American, English pronunciation of 
Spanish words, proper names, trademark and technical 
words, etc. 

[0045] The ASR module 32 includes a verbal infor- 
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mation venficatton (\/IV)fportion£32a for providing utter- ■ 
ance Verffibiation>t6 ths ASB modute 32rThis preferred 
form of the ASR (mocfcile traving ^^e verbal information 
verifiration:(V>l(V) iport3Qfi compares the output of p.ho- . 
nemes processed by.th^ TTS; engift^ and voice, its own. 
acoustic ;m ode t . or^ar^y. :de rived aco u stic m.ode) , o.r utter- 
ances, spoken by the student. The VIV portiqn.^nalyze,s 
the similarity witb-Whigti a.speakep matche^ the file ere-.- 
ated by the iTTS^iroDtfllyj.e 30. This compari.son.proyicl^ 
the basis, of. the..fe^dbaclc to .the student, Aa^oy^r^H : 
scoreds offered tOj.tli^ jgrtudent-for feedback.. Irt addition, 
i n di vid uah word. . psfts ^ Q.r-> p h o ri erne m atches are ana-, 
lyzed'to iadjGate-wh^ re -precise ly t he , student may be 
having difficuttyjrtpronunciatipn. Feedback is provided , 
to the st-udeaifer;. each portion of the; speech created. 
Reinforcement for pronunciation is provided to the stu- 
dent based upon rules of the language, identification of 
the word or word segment iden^tjfied^ known pronuncia- 
tion problems carried from the students native lan- 
guage/and: the student's, l^v^J of achievement. 
[0046] The animation module 34 provides visual aid 
to a student. tThe moeiule:>34 is synchronized with the : 
TTS hnodul6 30,^ petrieves>text,faes. and,; together with 
the "rrs-miklule or engine,5pr6nounces the word for the 
student tlnfrbii^h'ari' animated \image:,oti3 . human head 
and face. Preferably, the animated image of the face 
and human head portrays a three-dimensional perspec- 
tive and the image has.the capability of being rotated, 
tilted, etc. for full view from various angles. Accordingly, 
the student can observe characteristics of facial^^and 
mouth movements, andiplacement of the tongue^ lips 
and teeth during speech examples. The animation mod- 
ule synchronizes the facial movement with processing 
of the TTS module in manners that are well known to 
those of skill in the art. The student can observe the ani- 
mated image, or teacher, from any angle, with nonnal or 
transparent mode to further observe teeth and tongue 
placement. The teacher example can be modified. Vol- 
ume, speed, and vocal characteristics of the teacher 
may be changed by the student using the computer 
interface. Voice may be maie or female, high or lower, 
fast or slower As will be described hereafter, reinforce- 
ment will be provided to the student based upon rules of 
the language, known pronunciation problems carried 
from the student's inative language and the students 
level of achievemeinit.- -^ ; . 

[0047] The system 10 also includes a workspace 
module 42 that generates and facilitates processing in a 
viewable workspace on the display device 22. The work- 
space module 42 is linked to a pronunciation module 
44, mapping tables 46, word processing module 48 and 
record and playback module 50. 

[0048] The pronunciation module 44 includes data- 
bases containing records for words, word subgroups, 
vocabulary words used to teach typical sounds in a lan- 
guage, examples from parts of speech used to teach 
contextual pronunciation of words and tables of punctu- 
ation. The sample words are selected in creating the 
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pronunciation databases' based on grammatical and iin- 
guistic rules for the lahguage. Preferably, the sample 
words for each character or chafacter^group (e.g. dip-* 
thong) are ordered gene'.*al[y from more common usage' 
in pronunciation of the character to a less common ' 5 
usage. The modulo <4 also accommodates regionalr' 
accents through 1 the addition of specific pronuriciation. 
files which may be used to establish a profile in a spe-- 
cif ic context to demonstrate pronunciation alter natives - 
including but not limited to: American, regional- Ameri--^ 10 
can, English pronunciation of Spanish words, proper-, 
names, trademark and technical words,- etc: ^ , 
[0049] The mapping tables 46 "include "tables 46a ^ 
having stored therein mappings between the word sub- 
groups and the vocabulary words used to Teach typical is 
sounds in a language, tables 46b having stored therein- 
mappings between the words and the vocabulary words 
used to teach typical sounds in a language, and tableis 
46c having stored therein mappings between the words 
and the examples from parts of speech to teach contex- 20 
tual pronunciation of words. The sysfern also includes 
tables 46d storing examples -of punctuation typically 
used in a language that rriay be used in " lessons inde- 
pendently, or in the context^ of a "sub-word; word, or 
group. ^ ^ ~" ' - 25 

[0050] Refemny now to Figure 2v a representative 
view of the primary window generated by the system 1 0 
and appearing to the user is shown. The window 60 
includes a workspace 62 associated with the workspace 
module 42, a feedback area 64 primarily associated 30 
with the ASR module, an aniniation area 66 primarily 
associated with the animation module, and a control 
area 68 primarily associated with the TTS module and 
the animation module. The workspace 62 facilitates dis- 
play and manipulation of text for lessons for the student. 35 
The feedback area 64 facilitates display and manipula- 
tion of feedback provided to the student by the system, 
as will be hereafter described. The animation area 
includes, as shown, an exemplary animated face and 
head 66a. Last, the control area includes user interface 40 
control icons such as volunne adjustment 68a, speed 
adjustment 68b, a stop button 68c, a play button 68d, a 
pause button 68e, and a record button 6Sf. The student 
interactively manipulates the window 60 to perfomi 
functions according to the present invention. 45 
[0051] Tlie overall method of tria preferred embodi- 
ment is illustrated in Figure 3. It is to be appreciated that 
the methods described in connection with Figure 3, as 
well as Figures 4-11, are' implemented using hardware 
and software techniques that will be apparent to those 50 
of skill in the art upon a reading of the disclosure hereof. 
[0052] As shown, the method 300 is initiated with 
the input of text (step 302) and subsequent conversion 
of the input text to audible speech data (step 304). Audi- 
ble speech is generated and output based on the audi- 55 
ble speech data (step 306). Of course, the audible 
speech can also be represented by a variety of models 
including predictive models, phoneme models, diphone 



models or dynamically generated models. These, mod- 
els are generated cprimarity by. the- ASR module and 
associated elements. However, in certain circum- 
stances,;the TTS module may also be used to generate 
the -acoustic ni ode Is.- When desired by the student, an 
animated rmage of a human (ace and head is then gen- 
erated primarily by the facial animation module 34 (step 
308) and the audible speech and animated image are 
synchronized (step 310). A student is theri prompted to 
replicate the audible speech . with spoken words, or 
utterances (step 312). The system then recognizes the 
utterances of the student (step 314) and compares 
these utterances to the audible, speech data primarily 
through . the module 32 (including, portion 32a) (step . 
3.16). Feedback is then provided to the student based 
on the comparison and a confidence measure which is 
correlated to customized scoring tables and used as a 
calibration point, as is known in the art, in a variety of 
manners as described below (step 318). The feedback 
preferably reflects the precision at which the user repli- 
cates the audible speech in the selected language. 
[0053] Figure 4 provides a more detailed descrip- 
tion for steps 302. 304, and 306. More particularly, a 
submethod 400 includes the selection of input text (step 
402) .followed by retrieval of the text using either a Uni- 
versal Resource Locator (URL), or a stored filed (step 
404). If a URL is used, the URL is typed into the field 
and the text is retrieved (step 406). If the text is stored in 
a file, the file is selected (step 408). In either event, the 
retrieved text is displayed in the workspace 62 (step 
410). The play button 68d is then pressed, or "clicked 
on" (step 412). A determination is made whether the 
selected text originated from a source located using a 
URL or a file (step 414). If the text originated by way of 
a URL, the markup language is preprocessed (step 
418). The text may be preprocessed to present ideal 
fomn for TTS processing, for example, removing any 
markup language, textual illustrations, parsing known or 
provisioned formats such as email, faxes, etc. In either 
case; a subset of the text is then prefetched (step 418) 
and text-to-speech processing is initiated (step 420). 
Optionally, the speed and volume of the speech is 
checked (steps 422 and 424). The sound is then played 
(step 426) and a determination is made whether the 
playing of the audible speech is complete (step 428). If 
the playing of the audible speech is not complete, steps 
413 to 428 are repeated. If the playing of the audible 
speech is completed, the process ends (step 430). 
[0054] In a situation where animation is used (e.g. a 
teacher prompt), a detailed description of steps 302 
through 310 of Figure 3 is shown in Figures 4 and 5. For 
brevity, submethod 500 simply replaces steps 418 to 
430 of Figure 4. 

[0055] Referring now to Figure 5, after the play but- 
ton 68d is pressed, the subset of the selected text is 
prefetched (step 502). Text-to-speech processing is 
then initiated (step 504). Text animation processing is 
also initiated (step 506). The speed and volume are 
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then checked (steps 508 and 5 10)\^i adjusted- if nec- 
essary. The sound and faciar moverrtent^flace^outpat toi * 
the user in the aninnation area 66 (steps' S^2 and' 514)- 
A determination is then rftadef-whetherthe playing olthei'j 
audible speech with animation, is; completed (step- ai^6):r:. 
It not complete,- sfeiDS 502 r6 51 6 -are repeated /1tii:dayr<.r 
ing of the audible- spfeBCti i with the- f animation :is; com^- 
plete, the prtoeess is'ended (step 518). !.. J- = i'M : '<■;(.:- 
[0056] ' RSfeifrhig"ba6k' 't6 Figure 3 - 'the student is* 
prompted dt stiefr'SI?' to' r^pHcate the audible spfeeoh-" 
played, in tfiis teg^yd/me st^ dr^ 
passively - prompted ' ^by ' it^e - system to repeaf the 
teacher's '^iexarnpleV^'Mfethd'ds - 6f prohnptihg'"iriGlude^ a ' 
nn ovi n g cu i^'o r, nri'dvin gr High lighted- a rea oftext; b r mov- ' 
ing icon. Audibre pronrrptst be used inclijclirig^)ut rfbt " 
linnited to: stating-"repeat -fitfter me" and then stating the 
word to be repeated. The' speed of the prompt Ts"a1s6'' 
adjustable. - . 

[0057] The student may choose to record tiis or her* '■ 
attempts atspfeech^durin'g a' lesson. The student gaft^^ lis- . 
ten to the teacher and his or" her recording for a side^by- 
side compariison'rThe recording can also be used by the 
Automated Speech Recognition -function to determine 
the student's performance as will be described beJow..^:. 
[0058] As shown in Figure 6r the record tng'jnhethOd;. 
is initiated by selectrng (or.'pressingzor ''clicking bh 
play button- 68d=>^nd ^reciral-d button: 68f^step 602). A 
determination is then made whether the sought text file^ 
is stored or should be retrieved byway of a URL (step. 
604). If a URL is involved; the markup language must be . 
preprocessed (step 606): . The -text is then .prefetched 
(step 608) and highlighted text is particularly selected^ 
(step 610). In either case, text-to-speech process tng is 
initiated (step 612) and then a detemninatiori is made; 
whether animation should be used (step 614): If anima- 
tion is desired, the animation data is processed (step 
616). Whether the animation is processed or not. the 
speed and volume are -checked (steps; 618 and . 620)., 
The sound is then played along^^with'.the animation, if 
desired (steps 622' and 624). A determination js made 
whether the playing is done (step 626): Jf the. process is: 
not complete, steps 606-626 are repeated. ; v o -.' . : ; 
[0059] If playing te-cbmpleted, as shown in Figure 7, 
a prompt to the student. is :made: by the system (step 
702). Text for student- replication -is .highlighted., (step 
704). The speed is checked --(step 706).; Recordingf.of 
the student is begun (step 708) andthe cursorls moved 
at a designated speed (step 710). A determination -is 
then made whether the process is complete (step 712). 
If not, steps 702 to 712 are repeated; !f the process of 
recording is complete, the process is ended (step 714): 
[0060] Referring back to Figure 3, the system rec- 
ognizes the utterances, compares them to the audible 
speech files or records for which models can be gener- 
ated, and provides feedback (steps 31 4, 31 6, 31 8). With 
respect to the step 316, as will be appreciated by those 
skilled in the art, the comparison could occur between 
the utterances and any of the following: audible speech. 



any acoustie modeh-predkrtiye mpduJe,- phoneme mod- - 
els, dipteone mpdels/pr dynamiGally generated models. 
The=fee^dk?ack may be- provided in a. variety o* manners. 
One;fQnn:0f feedback is to allow the student to playback 

5 , thedesson. . . - . 

[006T]i^i>= FtefsM-rlng now to Figure . 8. a playback 
method ls-'sbown. in- this: regard, ithe method 800 first 
makes a determination whether a teacher is selected or 
a^studentsis selected for- playback (step 802). If a- 

7Q.r teac)ier^ssetected,'-text is highiighted:(step 804), spee 
and' volume"are checked (steps 806 and 808), text-to- 
speech and process animation data is processed (steps 
8tQ^and 18,12) aod, -the sound is. played and animation 
mo.vedc<tstep.s^eft43^d;$16). A determination is then 

15 made, whethier -the playback is complete (step 818). If 
not;: stepst^8Q4;tp;Bt8 a^e repeated,. If the process is 
complete, .ttisiteOTina^ed (step 820). 
[00.62] .lajf arSttident is selected at step 802, the text 
tat>e. pjayed=b©cKis *}i5hlighted^(step 822). The speed 

20 \ and vol unF>e>ar.etJhen: checked (steps 824 and 826). The . 
sound, or recorded, utterances of the student is played 
(step 828). A determination is then made as to whether 
playback ris;cpmpjet^: (step 830). If not, steps 822 to. 830 
are rep^atedi-Jfvthev student playback is , complete, the 

25 \ process is ended (step 832). 

[0063] The student may also select to be evaluated 
tQ. see how closely ih is or her pronunciation matches the 
teacher!.? model pronunciation. Automated Speech 
Recognition (iitterance verification) and Verbal Informa- 

30. tion Verificationi(VlV) are used through modules 30, 32 
and 32a (and associated elements) to determine accu- 
racy in pronouncing words, word segments, sentences, 
or groups of sentences,; In a preferred form, utterance 
verification would identify plosives such as "P" and "B" 

35 or and "D". Scoring .the accuracy includes but is not 
limited to: gross score for overall performance, score on 
individual sentences, score on individual words, and 
score on phonemes. 

[0.0)54] , ; Such-feedback to the student takes several 

40 f^KTTi.s, and may^ be . used, to, ^pore performance, deter- 
mine , reinforcement, . deternoJn^ feature levels of the 
application (noyjce,;-intexmediate, advanced). Feedbiack 
may .be gi ve n exp licit Jy -p r-jth/oug h a f i I e se nt to a teach e r 
through output file §[|Qr^e,26 (Rgure 1), or both. 

45 [0065] OveralJvficprj^s^. include nunneric values- (for 
sentence groupSj. §,erd;e?>Ges, words, and sub-words) 
calibrated to. aQCOunt for the student's level such as nov- 
ice, intermediate^ expert-, year of study associated with 
a syllabus to be 4jsed. as a reference file, or other. The 

50 system may be set to display or not to display this score 
infomnation to the student in the feedback area 64. The 
application can determine student's level through statis- 
tical information contained within the system or availa- 
ble over a network and student specific information 

55 collected while the student is interacting with the sys- 
tem, or by student level self-identification, or by teacher 
or therapist provisioning. Icons may be used to indicate 
level of perfonnance for the student feedback including 
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but not limited to a series of symbols such as stars, cir- 
cles, etc. arranged in groups, and any of the nriany sym- 
bols frequently^ used- to indicate successful task: 
completion. An example would be. to have three circles 
available. When the student needs some. improvement, 
two would be shown. When the student . needs more 
improvement to match the model; only one circle would 
be shown. When the student is successful in: closely > 
matching the-itiode! (based on.pre-determlned student 
level) all circle's would be displayed, eoior may be:used.5 
to indicate level of performance.. . . : ' 
[0066] ' " Feedback on performance rriay be ■ given 
while the student is reading the text or 'after task com^ 
pletion. While the student is reading the text, the Verbal 
Information Verification processing' (or utterance verifi- 
cation processing) can be used to display ireal -time -per- 
formance feedback. The system may use any nuhnber of 
graphical or audio cues to indicate perfo'rrhance includ- 
ing but not limited to bars, icons, colors, sound effects, 
orTTS text files. The system will indicate to the student 
that there is a problem and will help the student to 
decide if he or she should repeat the task, change the 
speed, move to another mo'de or feature such as word 
building, or listen to the teacher exarhple again: Default 
options will be established based upon the' specific per- 
formance issue and v^^ili be determined by,' fcr example, 
the VIV feature. 

[0067] Once a student has been practicing for some 
period of time, he or she can again request feedback. A 
summary can be created to provide feedback to the stu- 
dent including but not limited to highlighted words within 
text, overall scores, discrete scores for segments of 
work, icons to illustrate overall achievement, and audi- 
ble feedback indicating performance. The audible feed- 
back can be a sound effect such as a cheering crowd, or 
a can sound when a cursor is moved over a word that 
was not pronounced well. The student can play back 
and listen to the model and their own pronunciation for 
comparison. 

[0068] A word and punctuation list can be used to 
practice pronunciation skills, review definitions, etc. The 
list may be created from a lesson file (e.g. lesson file 
40), from a dictionary or other reference material, from 
the Inter-net (e.g. through input link 20), or from a list of 
sub-words, words, or groups (e.g. stor-ed in pronuncia- 
tion file 44) pronounced inaccurately by the student. 
One advantage to' the system is that combinations of 
words and punctuation impact pronunciation and the 
system can identify these cases and provide feedback 
and reinforcement for these cases. For any case includ- 
ing for words that have been mispronounced, the sys- 
tem can arrange the words into an order such as closest 
match through not well matched and as a default, can 
present the items needing most work at the top of the 
list. Word lists appear in the working window. In the 
example given, a working window appears in the feed- 
back area 64. The student can use a tool to scroll 
through the list. The student can highlight and select a 



word with a voice command or mouse. When the pho- 
neme or word (on group) is highlighted, the teacher's 
voice is heard pronouncing the word. An optional fea- 
ture on high lighting- a sub-word, word, or group is to set 

5 the system to repeat the teacher's voice and also the 
student's voice for side by side feedback or to go to the 
recorded dictionary to play a sound file. The student can 
try to pronounce the word again at this point and get 
feedback. When the word is selected, the student can 

10 see a more detailed workspace feature in the window. 
The workspace feature uses language rules to process 
the sub -word, word, or group and display spelling, punc- 
tuation, stresses (accents), s^Hlables, etc. The student 
can select to hear the example again and try to pro- 

75 nounce it. The student is scored. again, and if perform- 
ance is improved and feedback is satisfactory as 
determined by the student or teacher, the word lesson is 
ended. If not, more help may be given by the system. 
[0069] ' If - the student has trouble pronouncing the 

20 _ word with audible example and feedback, reinforcement 
is offered through the working window 60. Moving the 
cursor over a portion of the displayed sub-word, word, 
or group activates the default feature to. pronounce it. 
This feature may be turned off. Selecting the word por- 

25 tion provides reinforcement: windows with help for the 
student. An example :of reinforcement help includes a 
message saying "Try this ... in the word 'graduate' the.'d' 
is pronounced with a 'j' sound as in 'jar." With a table 
indicating known language rules for pronunciation, text 

30 messages are dynamically created based upon the cir- 
cumstances v»^ithin the selected sub-word, word, or 
grouping. The student sees the message in the window, 
and aiso hears the teacher speak the message. 
[0070] Messages are nested by the system. If there 

35 are multiple linguistic reasons why a match is not made 
between the model and the student in a particular sub- 
word, word, or group case, then the messages are pre- 
sented to the student in an order detemnined. ahead of 
time. The most likely pronunciation rule is first, then less 

40 likely rules in a determined order. 

[007'£] An analysis of known errors of pronunciation 
will be used to assist the student. For example, there 
are known linguistic errors made by Korean students 
studying English. The 'Try this..." system of messages 

45 will include considerations for the user of the system 
and wiii present instructions most likely to help that par- 
ticular student based upon student self -identification. 
Text or audible help for this feature may be presented in 
the native language or the target language, a combina- 

50 tion, or both. For example, the pronunciation files 44 
may include a table of sub-words and corresponding 
sub-words in another language. For word sound drill, for 
example, when learning a first language (given a stu- 
dent who natively speaks a second language), sub- 

55 words from the first language may be mapped to sub- 
words in the second language, to illustrate sound alike 
comparison to the student. The sub-word table will also 
be used to locate and display/play vocabulary words 
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using the sub-word from either language.. Another prac-- 
tice feature associated with the w^)rk5pace:rs an. option 
to list practice sub-words, words, or groups in a window, 
and permit practice; of sounds relating toi the specific 
problem encountered by the student. -An example; would 
be to highlight the-area needing practice, such as - at.^ A' 
list would be displayfed with words containing this com-; 
binati6n^si:jch 'as""'balk^ 'talk; and 'walk.' Therteacher.j 
would y^ad the e)iaRnple; and-the student could practice.-^ 
the words, ^en the- istudent could return to the:Qrigjnalc; 
sub-word, word, or' group being^drilled, and continue to; 
practice.^/ '"-^"-' '-■-^'"^^'^ - ^. .r.;. v ■ ^ -v ^ ^.j i-' 
[0072] The 6tudeiit 'bah review' the lessorr- ift'-^riy ? 
nnode rncluding teabher exanriple, -exan^ple and promptv 
exarhple, 'prompt and record,- example, prompt, recofd, ' 
and compare.^' ^ "cj/^ v . ■•: 
[0073] The student lessons may be presented "rfva • 
graphic iljustratio'n.'The student can zoorri in^forfurther^ 
detail. The student can navigate around the conteriitandi 
highlight and select an area or detail to be studted; The. 
student performance may be presented in telescoping, 
graphical representations to provide access to all or 
portions of the session completed. The student can 
zoom in and refine skills, review complete-sessions- < or 
select portions. Higher levels: wi I hbe illustrated withjess 
detail. Zooming- in: will providee-smaHer pieces! with impre. 
detail. The student- can.; decide where in theclesson to 
begin from tHis overall graphical representation. 
[0074] As to scoring and evaluation of the perform- 
ance of a student, a variety of techniques and >ope ra- 
tions may be incorporated^ into the system. -Scoring 
techniques are well known in the art-. However, in a pre-, 
ferred form, customized scoring tables are generated 
with confidence scores as calibration . points, iasois 
known in automated speech -recognition practice.- For 
example, a series of sentences may be provided which 
represent the basic ; inventory of phonemes and 
diphones in a language-^The student will read. the sen- 
tences and they will be recorded. The. sub- words will be 
analyzed to determine baseline score or starting-per- 
formance of the student This may: be used to determrae 
progress, to establish a level for exercises, or. to identify 
areas to work on. A table of reference scores may, ajsp 
be provided for grade levels in language classes given 
populations of students. The student progress can: be 
measured and graded on an: individual basis- or afe.com- 
pared with the population of choice. v ^ r 
[0075] Scores for a student speech are provided in 
sub-words, words, sentences^ or paragraphs. Students 
may receive an overall score, or a score in individual 
parts of the speech. 

[0076] Normalization issues regarding verification 
of speech are managed through an interface. Given 
speech of differing duration, and complexity, the ani- 
mated cursor on the screen can be set by the system or 
by the student When the student reads along with the 
animated cursor, the verification process can correlate 
the text which is highlighted with the sound file to be 



analyzed.. v.-r- , v. j^.- , . 

[0077] ' Certain- recorded sounds.can also be inter- 
jected for emphasis of natural:: sound for- known sub- 
wordSiOrwordsoof a given language:;:=RT<ese words might^ 

5 be'taken fromipreviousty .recorded:dictionary, applica-. 
tion, or other resource;. - /. 
[0078] ' ri' BasBline- scores can then be; recorded in a 
table-. (such as shown, in Figure 1. at 52);v,Tl?»e tabl^:52 
maytake a variety of fomris and is useditp determine ap 

10 appropriate :leveb6f a lesson or g rading -approach lo be . 
selected for the student With this table, 'the system can 
autpmaticallY- use the-sanfie text, cpntertt, .etc. f or, stu- 
dents ot- different abilities; ^by pnodifying thresjhplds of 
confidence nr}ea§ur€.ment-: . 

15 [0079] :r "Thei-student can. also user a graphical user 
interfjace,.tPiestablish -or mpdtfy thresholds for the confi- 
dence, measuremerit, grade level, or other attributes. To 
track:his .or her;progr;ess, the student registers an iden- 
tification ndmt^er.v.baseline .score^and subsequent les- 

20 son scores: to achieve customized lessons and to track 
progress.. - 

[0080] . Mpre, specifically. Figure 9 illustrates a pre- 
ferred methpd fpr -^ studer^-tp so register. The methpd 
9Q0ds (initialed by rthe entry- pt an identifiq^lon number 

25 by;the:Stud§nt:(step^io2).. A student..gF^^^^ ievel evalua- 
tion process. -is. then :Corrip.leted (step 9.04). A score is 
recorded (step 906) and a subsequent lesson is 
^elected (step 908),iThe selected .lesson is scored (step 
910) and the student's record is updated (step 912). , 

30 [0081} ' Referring now to Figure 1 0, the student 
grade level evaluation process of step 904, for example, 
is detailed as a method 1 000. Jn this process, a first par- 
agraph is displayed (step 1002). The: student reads the 
first . paragraph .<step 1.004). A. confidence score is 

35 measured by the system (step 1006). Grades are-pro- 
vtded;for the total paragraph as well as sub-paragraph 
elements (step 1008), The scores are compared to 
other scores using a table lookup (step 1010) to deter- 
mine if a predetemnined threshold is exceeded (step 

40 1:012),.lf the threshpld is. exceeded, then a second par- 
agraph is dispJayed for evaluation or, -if the threshold is. 
not exceeded., the, gradq. level is simply -displayed (step 

. 1014). = . ,r,U.^ - ^ 

[0082] . If'the seeondp^ragraph is displayed, the stu- 
45 dent reads /thaiSecpnd . paragraph (step 1016) and a 
confidence leye J i§iFn§^sured (step 1018), A total para- 
graph and sub-paragraph score is obtained (step 1020) 
and, again,, a table lookup is used to, determine the 
grade or score (step 1 022). The steps are repeated until 
50 the score obtained from the table lookup does not 
exceed the predetermined threshold (step 1024). 
[0083] Referring now to Figure 1 1 , a scoring exam- 
ple 1 100 for a lesson selected and scored in steps 908 
and 910 is illustrated. The exemplary lesson has a 
55 score of ninety percent (90%) for an elementary level 
student. First, the student is requested to recite a sen- 
tence for which scores are given for each word of the 
sentence as illustrated (step 1102). These scores for 
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each word identify words for which'^lesso'ns should be 
given to the student. -In the example' shown, the word 
"fox" only received a 50 percent score so the student is 
further tested on the word "fox" (step 1104). The.stu- ; 
dent's pronunciation of the letters of the word "fox" are 5 
then given scores and, in the example, the "r and "x" 
sounds are determined to require further lessons (step 
1 1 06). Elementary vocabulary words with the "f " and "x" 
sound are respectively selected for lessons (step 1 1 08). 
The same operation of steps 1-1 04 to -1 108 iS' repeated ' to 
for other words or sounds that were, given low scores 
(e.g. "jumps", "lazy" and "dog's") in the initial sentence 
instep 1102 (step 1110). A variety of words are then 
drilled in the lesson, including each identified sound ^ 
(step 1112). If necessary, recorded sounds from the die- 75 
tionary are played as mode! sounds for the student (step 
1114). The lesson is then scored and. a table is created 
for lesson evaluations (step 1116). 
[0084] The system and method according to the 
present application provides many advantageous fea- 20 
tures and applications. Functions described above are 
combined to create feature rich applications for learn- 
ing. The system includes scripted lesson combinations 
to enable any student to use the system with ease. With : 
experience, the student or teacher can^rrange for cus- 25 
tomtzed combinations of functions to help a sjDecif ic stu- 
dent learning issue or learning style (and for creating 
individualized plans for compliance with PL94-142). The 
system ■ will also recommend feature combinations 
based upon scores of the student and available tunc- 30 
tions associated with improving student skills. The sys- 
tem includes files with lessons tailored to helping 
students learn the basics of pronunciation for the target 
language. In addition, tables of references for the lan- 
guage, grammar, spelling, syntax, and pronunciation 35 
rules are included with their associated help messages 
for reinforcement learning. The flexibility of this system 
makes it an ideal tool for the classroom and for the adult 
student. 

[0085] Directed learning experience - The basic 40 
system feature is to demonstrate language by process- 
ing text to speech and playing that speech to the stu- 
dent. The student can become familiar with the sound of 
the language to be learned. This student can listen to 
the examples and learn about pronunciiation. 45 
[0086] Listen to any word - With Text-To -Speech 
technology, the student can learn to imitate the lan- 
guage sound even when a native speaker is not availa- 
ble. Availability of recorded samples, lessons, etc. and 
the availability of a dedicated native speaker are con- so 
strained resources for students studying English, for 
example, in many environments. With Text-To-Speech. 
those constraints are eliminated. All materials on the 
web, any text file, and any prepared lesson becomes a 
multi-media language lesson. Any automatically gener- 55 
ated text file may be used to create up-to-the-minute 
language lessons with this system. For example, by col- 
lecting closed captioning text from any movie, television 



or news program^, text files may be created that can be 
used by:the functions of the system as content. 
[0087] Listening comprehension - The basic system 
feature of processing Text-To-Speech provides opportu- 
nities for a student to practice listening comprehension 
skills without requiring the production of special content, . 
and without requiring another person to be present. In 
this case, the text may be hidden to improve the per- 
formance*of the feature. . - / . , . 

[0086] Example and prompt - By conibining Text- 
To-Speech processing of the text, with the Facial Ani- 
mation, an example is created for the student. Another 
feature of the system adds a. prompt for the student to 
repeat the example. The student ca.n use this.feature of 
the system to hear an. example and then practice with- 
out being recorded, graded, or receiving feedback from 
the system. 

[0089] Example, prompt, ^ record - The system can 
combine three functions to provide a means for the stu- 
dent to listen to the example, hear and or see a prompt 
of when to read and what to read, and to record his or 
her own efforts to say the sub-word, word, or phrase. 
[0090] Example, prompt, record, playback - The 
system can combine functions to provide a means for 
the student to listen to the example, hear and or see a 
prompt, record speech, and then play back the example 
and the student speech for side by side comparison by 
the student. 

[0091] Self selected reinforcement - If the student 
identifies a problem with a particular recorded sample 
and determines that help is needed, the student can 
access context specific help which is described in the 
function section workspace section. The student has 
accessed a help system that can identify the rules of the 
language associated with the word highlighted and can 
present the Try this... "series in a predetermined order 
based upon known student errors in the general popula- 
tion or in the group with which the student is identified. 
The student can view and listen to some or all of the 
reinforcement help messages. . 
[0092] Example, prompt, record, playback, com- 
pare, display results - One of the comprehensive fea- 
tures of this system includes the combination of the 
teaclier example with audio and visual, options of alter^ 
ing appearance of the teacher, options of altering the 
sound characteristics of the teacher, seeing and or 
hearing a prompt, recording speech, allovying for play- 
back to hear performance, using Automated Speech 
Recognition to process the student's spoken words, 
obtain a metric for performance, and then to display that 
performance within a flexible and adaptable frame of 
reference. 

[0093] Grammar skills - With the addition of a word 
processing component, the language tutor can teach or 
reinforce grammar skills. The student can listen to a text 
passage and question, formulate an answer and speak 
or type the answer into the system. The results of the 
word processing program wilt generate examples of 
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errors in sentence syntax, etc. that witl be used by the 
system to reconrinnend Try this. - -""examples based- 
upon known rules of theManguage: Problem areas yviH^j 
be highlighted as described above,.aad the student can^^ 
use the working; window. to practice skills, LessonsTon , 
typical pronunciation issues for :speakers of, Asiarv^ari-;.; 
guages when teaming English are included in the ;sy^- 

tern. ^ :■ . ( ^ . ^ 

[0094] Several functions in the system may becom^--: 
bined to preseht'lesfebn materials to the student. TByc 
combining several functions; the teacher can control ttie 
lesson plan, 'rnethbds - of teaching, and student^jfeedT; 
back: This provides significant flexibility in: the system. , 
and puts the'user in control: -Individualized Educational.; 
Plans can be easily construGted for students making^: 
compliance with PL94-1 42 simple for the teacher.;. An;, 
important feature of the system combines functions of 
Text-To-Speech and Facial Animation as a visual aid;th 
pronunciation and "typical facial, - mouth, and'-tongue 
movements associated with speech using reahexam- 
ples from the lessons. This feature is valuable to stu- 
dents studying a language other than their native 
language and also to students working with a speech 
therapist. :■ . 

[0095] Speciahinterest or subject content might.be 
desired in this circumstanced FtM^ exannp I e;: an employee 
of a company dealing s with automobile parts or an 
employee of a medical establishment might want to use 
content from the company literature to practice listen- 
ing. Then he or she would be able to practice special 
words, phrases, etc. that he or she might be likely to . 
hear in their environment, and therefore would be inter- 
ested in understanding. 

[0096] The above description merely provides a 
disclosure of particular embodiments of the invention 
and is not intended for the purposes of limiting the same 
thereto. As such, the invention is not limited to only the 
above described embodiments. Rather, it is recognized 
that one skilled in the art could conceive . alternative 
embodiments that fall within the scope of the invention. 

Claims 

1. A system for interactive language instruction for a 
user comprising: ' ■ 

a first module configured to convert input text to 
audible speech in a selected language, the 
audible speech being patterned after a model; 

a user Interface configured to receive utter- 
ances spoken by a user in response to a 
prompt to replicate the audible speech; and, 

a second module configured to recognize the 
utterances and provide feedback to the user as 
to a precision at which the user replicates the 
audible speech in the selected language based 
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9. 



on a coRiparison ofr.the- utterances to one of the 
. * audible'.speech and the;nriodeL. 

The-syktemas set forth in ctaim :1 further compris- ■ 
ing a third' module synch ronized:to .the first module, 
a third module producing an animated image of a 
hUmamface and head pronouncing, .the . audible 
spieech. • 

'The system as set forth in claim 2 wherein .the ani- 
mated image. ot the human face and head portrays 
a'transparent! face and head. 

The-;system as set^forth in claim 2- wherein the first 
gihd third modules /further include controls to control 
one of the volume; speed, and vocal characteristics 
of the vide<3(>image and the audible speech. 

The system.:- as set .torth^.in claim 1 wherein the 
model is.:. one of a predictive model, phoneme 
model, a diphone mpdeK and a dynamically, gener- 
ated model, ..^ .q,- , , 

The .systenri. as set forth iri x?!aim, .1 wherein the.first 
module includes files storing madel: pronunciations 
for words comprising the input text. 

The system as set forth in claim 1 further compris- 
ing lesson files wherein the input text is based on. 
data stored in the lesson files. - 

The system as set forth in claim 1 wherem the input 
text is based on data received from a source out- 
side of the system, . 

The system as set forth in claim 1 wherein the sys- 
tem further includes dictionary files. 



10. The system as set forth in claim 1 wherein the sys- 
tem further comprises a record and playback mod- 
ule. . . , 

11. The system as set forth in claim 1 wherein the sys- 
tem further includes a table, storing mapping infor- 
mation between word subgroups and vocabulary 
words. . ' 



12. The system as,set forth in claim 1 wherein the sys- 
tem further includes a table storing mapping infor- 

50 mation between words and vocabulary words. 

13. The system as set forth in claim 1 wherein the sys- 
tem further includes a table storing mapping infor- 
mation between words and examples of parts of 

55 speech. 

14. The system as set forth in claim 1 wherein the sys- 
tem further includes tables of punctuation. 
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15. The system as set forth in claim 1 wherein the sys- 
tem includes specific pronunciation files. 

16. A system comprising: , ^ . 'J < \ 

a first module configured to convert input textitoc 
audible speech in 'a selected language, the" 
audible speech indicative of a model; 

a second module synchronized to the first n?.od- 
ule, the second module: producing an animmed 
image of a human face and head pjonounc 
the audible speech; { 



19. The method as set forth in claim 17 wherein the 
feedback comprises providing a playback of 
selected' pdftibhs of the audible speech and utter- 
- ances. - - ' 



17. A method for voice intoractive language instruction, 
comprising: ; '■ " H • 

converting input text data to audible speech 
data; 
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a user interface posiLioned . to receive utter- is 
ances spoken byva. Dser-in ressponse: to^a 
prompt to replicate the auciibte sp^eech; arid; / 

: i ^\ 
a third module configured to recognize .:the 
utterances and provide feedback to the user as 20 
to a precision at which trie user replicates. the 
speech in the selected Janguage based on a , . 
comparison of the utterances to one of the 
audible spoech andjha model. I r-^^ - 



25 



30 



generating audible speech comprising pho- 
nemes based on the audible speech data; 

outputting the audible speech through an audio 35 
output device; 

generating an animated image of a face and 
head pronouncing the audible speech; 



synchronizing the audible .:speech and the 
video image; V 

prompting the user to- • replicate the audible 
speech; ^ ' 

recognizing utterances generated by the user 
in response to the prompting; 



40 



. 45 



comparing the audible speech to the utter- 50 
ances; and, 



providing feedback to the user based on the 
comparison. 

1 8. The method as set forth in claim 1 7 further compris- 
ing receiving the input text from one of a network, a 
stored lesson file, a scanner, and the internet. 



55 



12 



BNSDOCID: <EP 10S3536A2J_> 



EP 1 083 536 A2 




BNSDCXID: <EP 1083536A2J_> 



EP 1 083 536 A2 



FIG. 2 




BNSDOCID: <EP 1083536A2J_> 



14 



E? 1 083 536 A2 



FIG. 3 



C INPUT TDd ^) 302 

1 



CONVERT TEXT DATA TO 

mfm ^sim okTA 



304 



^1 i , 

OUTPULAilBttii SPEECH 



I 

kit ! ^ 

308 



I GENERATE I 

i ANIMATED IMAGE I 

::z::::e:z:::; 

[ SYNCHRoioZE AUDIBLE ^ 
! SPEECH AND IMAGE 



310 



""T ■ 

PROMPT STUDENT | 312 



I . 

RECOGNIZEUTTERANCES | 314 



I 



COMPARE AUDIBLE SPEECH 
-TO UTTERANCES 



316 



PROVIDE FEEDBACK | — - 318 



EP 1 033 536 iPi2 



FIG. 4 



( SELECT TE)a 




TYPE URL 



j DISPUT TEXt i MNDOWj 410 



416 

SI 



PREPROCESS 
MARKUP LANGUAGE 

— n — 




41B< 



422 



PREFETCH SUBSET OF 
TEXT (NEXT) 

i 



PROCESS IM 
TEXT-TO-SPEECH 



CHECK SPEED | 




420 




16 



BNSDOCID: <EP 1083S36A2J_> 



t - 



EP 1.0831536 A2 



502- 

504- 
506' 



't 

SHOW TEACHER1»R0MPT 

^CONTINUE FROM tIGr 4" 
r (AFJE?iTEP- 4121 ■ 



PREFETCH SUBSET OF 
TEXT (NEXT) -" 



PRdCXSSiM 
TEXT^TO-SPEECH 



I 



PROCESS IN TEXT 
TO V^TION 



508—- 


CHECIC SPEED 






510^ ■ 


CHECK VOLUME 






512 — - 


PUY S0OMD 




* 


Su- 


MOVE Fjl^CE, ETC. 


sie^ 


518-< END ) 



f 



m 



o 



BNSDOCID; <EP 1083536A2_I_> 



17 



EP 1 033 536 A2 



FIG. 6 



602 > K SELEC T PUr/RECORD 




TO no. 7 



18 



19 

BNSDOCID: <EP 1083536A2J_> 



EP 1 083 536 A2 



FIG, 8 

PLAYBACK 




800 



STUDENT 



HiGKUGHT TEXT 



CrrtCK SPEED 



CHECK VOUlUE 



I 



PLAY SOUND 

HI 



MOVE CURSOR 
yOVE FACE, 



ISOR I 
. ETC. I 




C END ' 



822 



" HIGHLIGHT TEXT 



r-^f--^ 



824 



I 



826 



CHECK SPEED 

~T 



rapCESSlEXT-TO-SPEECH | 

■ 11 . f 812^ . rv' J;.,. .830 ' 
PROCESS ATSMATION | 
r r8U 



CHECK VOLUME 
I 




FIG. 9 

900 



STUDENT ENTERS IDENTinCATIOH 

T 



V-902 



STUDENT GRADE LEVEL EVALUATION PROCESS I — m 

V ; 

SCORE RECOR DED 1 -^906 



USSON 1 ... N 



V-908 



SCORED LESSON 
I 



V-910 



RECORD UPDATED 



1^912 



20 



1083S38A2J_=- 



^ EP 1 083 536: A2 



i^Ki< 

FIG. 10 



PARAGRAPH 1 DISPliAYED 



STUDENT READS 



CONFIDENCE ME ASURED 



GRADEDAOTAL PARAGRAPH 1 ^^ftorft^ 
GRADED/SUB PARAGRAPH ELEMENTS J^"^^"' 



1002 

Tm 

1006 



J 



TABU LOOK-UP 
DETERMINE GRADE 



— 1010 



IF THRESHOLD EXCEEDED THEN 



I 



J--1012 



PARAGRAPH 2 DBPUYED ELSE DISPUY GRADE [ ^1014 

r 

J— 1016 
^1018 

_1020 



STUDENT READS 
t 



CONnOENCE ME ASURED 



GRADEDAOTAL PARAGRAPH 
GRADED/SUB Pi^R^H ELEMENTS 



TABLE LOOK-UP 
DET ERMINE GRADE 



SAME AS ABOVE. PARAGRAPH 3, 4, 5, ETC. 



1022 



1024 



m 
O 



BNSDOCID: <EP 1083536A2J_> 



21 



EP 1 083 536 A2 



fig: f i 



1100 



OVERAa SCiORt b(AMPi^ S[OSf;{E;UMEN|W^ 



THE QUICK BROWfil tbX JUliT^ d^^^ DOG'S TAIL 

SCORE: 85% 85% 85% 50% 50% 85% 85% 50% 50% 85% 

.... ...... -.J A ♦ ♦ . 



ISSUES IDENTIFIED: 



FOX JUMPS UCV DOG'S 



1102 



r FOX 
SCORE: 50% 85% 50% 



1104 



DRILL: "f" SOUND 
"x" SOUND 



1106 



SELECT ELEMENTARY VOCABULARY WORDS WITH "f" SOUND. 
SELECT ELEMENTARY VOCABULARY WORDS WTTH "x" SOUND. 



— 1108 



REPEAT WITH OTHER WORDS; SOUNDS. 



1110 



Dm WORDS WITH EACH IDENTIfP SOUND. 



}^1112 



I 



PlAY RECORDED SOUND FROM DICTION ARY AS MOOa FOR STUDENT."! -^ 1114 

' r 



SCORING DETERMINES BASELINE ALSO. 
BASEIJNE CREATES TABLE FOR LESSON EVALUAHONS. 



^1116 



P TABLE 


1 

• 
• 


50 

• 
• 


40 


90 



22 



BNSDOCID: <EP 10S3536A2J_> 



